{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "O8hkxXtRCzJj"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_06_2_cnn.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YDTXd8-Lmp8Q"
   },
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 6: Convolutional Neural Networks (CNN) for Computer Vision**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ncNrAEpzmp8S"
   },
   "source": [
    "# Module 6 Material\n",
    "\n",
    "* Part 6.1: Image Processing in Python [[Video]](https://www.youtube.com/watch?v=V-IUrfTJMm4&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_06_1_python_images.ipynb)\n",
    "* **Part 6.2: Using Convolutional Neural Networks** [[Video]](https://www.youtube.com/watch?v=nU_T2PPigUQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_06_2_cnn.ipynb)\n",
    "* Part 6.3: Using Pretrained Neural Networks with Keras [[Video]](https://www.youtube.com/watch?v=TXqI9fp0imI&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_06_3_resnet.ipynb)\n",
    "* Part 6.4: Looking at Keras Generators and Image Augmentation [[Video]](https://www.youtube.com/watch?v=epfpxiXRL3U&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_06_4_keras_images.ipynb)\n",
    "* Part 6.5: Recognizing Multiple Images with YOLOv5 [[Video]](https://www.youtube.com/watch?v=zwEmzElquHw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_06_5_yolo.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6hj6qzYNCzJl"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "fU9UhAxTmp8S",
    "outputId": "b06312b1-829a-4621-f1f3-d5d0ed2aa0c0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# Nicely formatted time string\n",
    "def hms_string(sec_elapsed):\n",
    "    h = int(sec_elapsed / (60 * 60))\n",
    "    m = int((sec_elapsed % (60 * 60)) / 60)\n",
    "    s = sec_elapsed % 60\n",
    "    return f\"{h}:{m:>02}:{s:>05.2f}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Jf_otSJdmp8k"
   },
   "source": [
    "# Part 6.2: Keras Neural Networks for Digits and Fashion MNIST\n",
    "\n",
    "This module will focus on computer vision. There are some important differences and similarities with previous neural networks.\n",
    "\n",
    "* We will usually use classification, though regression is still an option.\n",
    "* The input to the neural network is now 3D (height, width, color)\n",
    "* Data are not transformed; no z-scores or dummy variables.\n",
    "* Processing time is much longer.\n",
    "* We now have different layer times: dense layers (just like before), convolution layers, and max-pooling layers.\n",
    "* Data will no longer arrive as CSV files. TensorFlow provides some utilities for going directly from the image to the input for a neural network.\n",
    "\n",
    "\n",
    "## Common Computer Vision Data Sets\n",
    "\n",
    "There are many data sets for computer vision. Two of the most popular classic datasets are the MNIST digits data set and the CIFAR image data sets. We will not use either of these datasets in this course, but it is important to be familiar with them since neural network texts often refer to them.\n",
    "\n",
    "The [MNIST Digits Data Set](http://yann.lecun.com/exdb/mnist/) is very popular in the neural network research community. You can see a sample of it in Figure 6.MNIST.\n",
    "\n",
    "**Figure 6.MNIST: MNIST Data Set**\n",
    "![MNIST Data Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_8_mnist.png \"MNIST Data Set\")\n",
    "\n",
    "[Fashion-MNIST](https://www.kaggle.com/zalando-research/fashionmnist) is a dataset of [Zalando](https://jobs.zalando.com/tech/) 's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image associated with a label from 10 classes. Fashion-MNIST is a direct drop-in replacement for the original [MNIST dataset](http://yann.lecun.com/exdb/mnist/) for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. You can see this data in Figure 6.MNIST-FASHION.\n",
    "\n",
    "**Figure 6.MNIST-FASHION: MNIST Fashon Data Set**\n",
    "![mnist-fashion](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/mnist-fashion.png \"mnist-fashion\")\n",
    "\n",
    "The [CIFAR-10 and CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) datasets are also frequently used by the neural network research community.\n",
    "\n",
    "**Figure 6.CIFAR: CIFAR Data Set**\n",
    "![CIFAR Data Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_8_cifar.png \"CIFAR Data Set\")\n",
    "\n",
    "The CIFAR-10 data set contains low-rez images that are divided into 10 classes. The CIFAR-100 data set contains 100 classes in a hierarchy. \n",
    "\n",
    "## Convolutional Neural Networks (CNNs)\n",
    "\n",
    "The convolutional neural network (CNN) is a neural network technology that has profoundly impacted the area of computer vision (CV). Fukushima  (1980) [[Cite:fukushima1980neocognitron]](https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf) introduced the original concept of a convolutional neural network, and   LeCun, Bottou, Bengio & Haffner (1998) [[Cite:lecun1995convolutional]](http://yann.lecun.com/exdb/publis/pdf/lecun-bengio-95a.pdf) greatly improved this work. From this research, Yan LeCun introduced the famous LeNet-5 neural network architecture. This chapter follows the LeNet-5 style of convolutional neural network.  \n",
    "Although computer vision primarily uses CNNs, this technology has some applications outside of the field. You need to realize that if you want to utilize CNNs on non-visual data, you must find a way to encode your data to mimic the properties of visual data.  \n",
    "\n",
    "The order of the input array elements is crucial to the training. In contrast, most neural networks that are not CNNs treat their input data as a long vector of values, and the order in which you arrange the incoming features in this vector is irrelevant. You cannot change the order for these types of neural networks after you have trained the network. \n",
    "\n",
    "The CNN network arranges the inputs into a grid. This arrangement worked well with images because the pixels in closer proximity to each other are important to each other. The order of pixels in an image is significant. The human body is a relevant example of this type of order. For the design of the face, we are accustomed to eyes being near to each other. \n",
    "\n",
    "This advance in CNNs is due to years of research on biological eyes. In other words, CNNs utilize overlapping fields of input to simulate features of biological eyes. Until this breakthrough, AI had been unable to reproduce the capabilities of biological vision.\n",
    "Scale, rotation, and noise have presented challenges for AI computer vision research. You can observe the complexity of biological eyes in the example that follows. A friend raises a sheet of paper with a large number written on it. As your friend moves nearer to you, the number is still identifiable. In the same way, you can still identify the number when your friend rotates the paper. Lastly, your friend creates noise by drawing lines on the page, but you can still identify the number. As you can see, these examples demonstrate the high function of the biological eye and allow you to understand better the research breakthrough of CNNs. That is, this neural network can process scale, rotation, and noise in the field of computer vision. You can see this network structure in Figure 6.LENET.\n",
    "\n",
    "**Figure 6.LENET: A LeNET-5 Network (LeCun, 1998)**\n",
    "![A LeNET-5 Network](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_8_lenet5.png \"A LeNET-5 Network\")\n",
    "\n",
    "So far, we have only seen one layer type (dense layers). By the end of this book we will have seen:\n",
    "\n",
    "* **Dense Layers** - Fully connected layers.  \n",
    "* **Convolution Layers** - Used to scan across images. \n",
    "* **Max Pooling Layers** - Used to downsample images. \n",
    "* **Dropout Layers** - Used to add regularization. \n",
    "* **LSTM and Transformer Layers** - Used for time series data.\n",
    "\n",
    "## Convolution Layers\n",
    "\n",
    "The first layer that we will examine is the convolutional layer. We will begin by looking at the hyper-parameters that you must specify for a convolutional layer in most neural network frameworks that support the CNN:\n",
    "\n",
    "* Number of filters\n",
    "* Filter Size\n",
    "* Stride\n",
    "* Padding\n",
    "* Activation Function/Non-Linearity\n",
    "\n",
    "The primary purpose of a convolutional layer is to detect features such as edges, lines, blobs of color, and other visual elements. The filters can detect these features. The more filters we give to a convolutional layer, the more features it can see.\n",
    "\n",
    "A filter is a square-shaped object that scans over the image. A grid can represent the individual pixels of a grid. You can think of the convolutional layer as a smaller grid that sweeps left to right over each image row. There is also a hyperparameter that specifies both the width and height of the square-shaped filter. The following figure shows this configuration in which you see the six convolutional filters sweeping over the image grid:\n",
    "\n",
    "A convolutional layer has weights between it and the previous layer or image grid. Each pixel on each convolutional layer is a weight. Therefore, the number of weights between a convolutional layer and its predecessor layer or image field is the following:\n",
    "\n",
    "```\n",
    "[FilterSize] * [FilterSize] * [# of Filters]\n",
    "```\n",
    "\n",
    "For example, if the filter size were 5 (5x5) for 10 filters, there would be 250 weights.\n",
    "\n",
    "You need to understand how the convolutional filters sweep across the previous layer's output or image grid. Figure 6.CNN illustrates the sweep:\n",
    "\n",
    "**Figure 6.CNN: Convolutional Neural Network**\n",
    "![Convolutional Neural Network](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_8_cnn_grid.png \"Convolutional Neural Network\")\n",
    "\n",
    "The above figure shows a convolutional filter with 4 and a padding size of 1. The padding size is responsible for the border of zeros in the area that the filter sweeps. Even though the image is 8x7, the extra padding provides a virtual image size of 9x8 for the filter to sweep across. The stride specifies the number of positions the convolutional filters will stop. The convolutional filters move to the right, advancing by the number of cells specified in the stride. Once you reach the far right, the convolutional filter moves back to the far left; then, it moves down by the stride amount and\n",
    "continues to the right again.\n",
    "\n",
    "Some constraints exist concerning the size of the stride. The stride cannot be 0. The convolutional filter would never move if you set the stride. Furthermore, neither the stride nor the convolutional filter size can be larger than the previous grid. There are additional constraints on the stride (*s*), padding (*p*), and the filter width (*f*) for an image of width (*w*). Specifically, the convolutional filter must be able to start at the far left or top border, move a certain number of strides, and land on the far right or bottom border. The following equation shows the number of steps a convolutional operator\n",
    "must take to cross the image:\n",
    "\n",
    "$$ steps = \\frac{w - f + 2p}{s}+1 $$\n",
    "\n",
    "The number of steps must be an integer. In other words, it cannot have decimal places. The purpose of the padding (*p*) is to be adjusted to make this equation become an integer value.\n",
    "\n",
    "## Max Pooling Layers\n",
    "\n",
    "Max-pool layers downsample a 3D box to a new one with smaller dimensions. Typically, you can always place a max-pool layer immediately following the convolutional layer. The LENET shows the max-pool layer immediately after layers C1 and C3. These max-pool layers progressively decrease the size of the dimensions of the 3D boxes passing through them. This technique can avoid overfitting (Krizhevsky, Sutskever & Hinton, 2012).\n",
    "\n",
    "A pooling layer has the following hyper-parameters:\n",
    "\n",
    "* Spatial Extent (*f*)\n",
    "* Stride (*s*)\n",
    "\n",
    "Unlike convolutional layers, max-pool layers do not use padding. Additionally, max-pool layers have no weights, so training does not affect them. These layers downsample their 3D box input. The 3D box output by a max-pool layer will have a width equal to this equation:\n",
    "\n",
    "$$ w_2 = \\frac{w_1 - f}{s} + 1 $$\n",
    "\n",
    "The height of the 3D box produced by the max-pool layer is calculated similarly with this equation:\n",
    "\n",
    "$$ h_2 = \\frac{h_1 - f}{s} + 1 $$\n",
    "\n",
    "The depth of the 3D box produced by the max-pool layer is equal to the depth the 3D box received as input. The most common setting for the hyper-parameters of a max-pool layer is f=2 and s=2. The spatial extent (f) specifies that boxes of 2x2 will be scaled down to single pixels. Of these four pixels, the pixel with the maximum value will represent the 2x2 pixel in the new grid. Because squares of size 4 are replaced with size 1, 75% of the pixel information is lost. The following figure shows this transformation as a 6x6 grid becomes a 3x3:\n",
    "\n",
    "**Figure 6.MAXPOOL: Max Pooling Layer**\n",
    "![Max Pooling Layer](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_8_conv_maxpool.png \"Max Pooling Layer\")\n",
    "\n",
    "Of course, the above diagram shows each pixel as a single number. A grayscale image would have this characteristic. We usually take the average of the three numbers for an RGB image to determine which pixel has the maximum value.\n",
    "\n",
    "## Regression Convolutional Neural Networks\n",
    "\n",
    "We will now look at two examples, one for regression and another for classification. For supervised computer vision, your dataset will need some labels. For classification, this label usually specifies what the image is a picture of. For regression, this \"label\" is some numeric quantity the image should produce, such as a count. We will look at two different means of providing this label.\n",
    "\n",
    "The first example will show how to handle regression with convolution neural networks. We will provide an image and expect the neural network to count items in that image. We will use a [dataset](https://www.kaggle.com/jeffheaton/count-the-paperclips) that I created that contains a random number of paperclips. The following code will download this dataset for you.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "GpfadrdQcVg8"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "URL = \"https://github.com/jeffheaton/data-mirror/releases/\"\n",
    "DOWNLOAD_SOURCE = URL+\"download/v1/paperclips.zip\"\n",
    "DOWNLOAD_NAME = DOWNLOAD_SOURCE[DOWNLOAD_SOURCE.rfind('/')+1:]\n",
    "\n",
    "if COLAB:\n",
    "  PATH = \"/content\"\n",
    "else:\n",
    "  # I used this locally on my machine, you may need different\n",
    "  PATH = \"/Users/jeff/temp\"\n",
    "\n",
    "EXTRACT_TARGET = os.path.join(PATH,\"clips\")\n",
    "SOURCE = os.path.join(EXTRACT_TARGET, \"paperclips\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ivCHAirHpNyT"
   },
   "source": [
    "Next, we download the images. This part depends on the origin of your images. The following code downloads images from a URL, where a ZIP file contains the images. The code unzips the ZIP file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "CExT2Z6gpAhz",
    "outputId": "44069f67-c041-45da-a4ab-67f5c135c1df"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2022-03-01 22:45:29--  https://github.com/jeffheaton/data-mirror/releases/download/v1/paperclips.zip\n",
      "Resolving github.com (github.com)... 140.82.121.4\n",
      "Connecting to github.com (github.com)|140.82.121.4|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/408419764/25830812-b9e6-4ddf-93b6-7932d9ef5982?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220301%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220301T224529Z&X-Amz-Expires=300&X-Amz-Signature=92d63a15fadf8b244e13610e65457b6628f86d8465cb450a763a00f4e70fde8f&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=408419764&response-content-disposition=attachment%3B%20filename%3Dpaperclips.zip&response-content-type=application%2Foctet-stream [following]\n",
      "--2022-03-01 22:45:29--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/408419764/25830812-b9e6-4ddf-93b6-7932d9ef5982?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220301%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220301T224529Z&X-Amz-Expires=300&X-Amz-Signature=92d63a15fadf8b244e13610e65457b6628f86d8465cb450a763a00f4e70fde8f&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=408419764&response-content-disposition=attachment%3B%20filename%3Dpaperclips.zip&response-content-type=application%2Foctet-stream\n",
      "Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
      "Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 163590691 (156M) [application/octet-stream]\n",
      "Saving to: ‘/content/paperclips.zip’\n",
      "\n",
      "/content/paperclips 100%[===================>] 156.01M  22.9MB/s    in 6.0s    \n",
      "\n",
      "2022-03-01 22:45:35 (26.1 MB/s) - ‘/content/paperclips.zip’ saved [163590691/163590691]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# HIDE OUTPUT\n",
    "!wget -O {os.path.join(PATH,DOWNLOAD_NAME)} {DOWNLOAD_SOURCE}\n",
    "!mkdir -p {SOURCE}\n",
    "!mkdir -p {TARGET}\n",
    "!mkdir -p {EXTRACT_TARGET}\n",
    "!unzip -o -j -d {SOURCE} {os.path.join(PATH, DOWNLOAD_NAME)} >/dev/null"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "p4LWl8TzzI8a"
   },
   "source": [
    "The labels are contained in a CSV file named **train.csv**for regression. This file has just two labels, **id** and **clip_count**. The ID specifies the filename; for example, row id 1 corresponds to the file **clips-1.jpg**. The following code loads the labels for the training set and creates a new column, named **filename**, that contains the filename of each image, based on the **id** column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "w4W4LeGSqYya"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\n",
    "    os.path.join(SOURCE,\"train.csv\"), \n",
    "    na_values=['NA', '?'])\n",
    "\n",
    "df['filename']=\"clips-\"+df[\"id\"].astype(str)+\".jpg\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AGxjITNtz0sC"
   },
   "source": [
    "This results in the following dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 423
    },
    "id": "rJpaMpmfswIp",
    "outputId": "c6462003-f016-4f0c-b8a8-81f276b9b3f4"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <div id=\"df-b98c878f-7d23-4f00-b807-e59d48e13378\">\n",
       "    <div class=\"colab-df-container\">\n",
       "      <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>clip_count</th>\n",
       "      <th>filename</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>30001</td>\n",
       "      <td>11</td>\n",
       "      <td>clips-30001.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30002</td>\n",
       "      <td>2</td>\n",
       "      <td>clips-30002.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>30003</td>\n",
       "      <td>26</td>\n",
       "      <td>clips-30003.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30004</td>\n",
       "      <td>41</td>\n",
       "      <td>clips-30004.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>30005</td>\n",
       "      <td>49</td>\n",
       "      <td>clips-30005.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19995</th>\n",
       "      <td>49996</td>\n",
       "      <td>35</td>\n",
       "      <td>clips-49996.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19996</th>\n",
       "      <td>49997</td>\n",
       "      <td>54</td>\n",
       "      <td>clips-49997.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19997</th>\n",
       "      <td>49998</td>\n",
       "      <td>72</td>\n",
       "      <td>clips-49998.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19998</th>\n",
       "      <td>49999</td>\n",
       "      <td>24</td>\n",
       "      <td>clips-49999.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19999</th>\n",
       "      <td>50000</td>\n",
       "      <td>35</td>\n",
       "      <td>clips-50000.jpg</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>20000 rows × 3 columns</p>\n",
       "</div>\n",
       "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b98c878f-7d23-4f00-b807-e59d48e13378')\"\n",
       "              title=\"Convert this dataframe to an interactive table.\"\n",
       "              style=\"display:none;\">\n",
       "        \n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "       width=\"24px\">\n",
       "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
       "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
       "  </svg>\n",
       "      </button>\n",
       "      \n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      flex-wrap:wrap;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "      <script>\n",
       "        const buttonEl =\n",
       "          document.querySelector('#df-b98c878f-7d23-4f00-b807-e59d48e13378 button.colab-df-convert');\n",
       "        buttonEl.style.display =\n",
       "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "        async function convertToInteractive(key) {\n",
       "          const element = document.querySelector('#df-b98c878f-7d23-4f00-b807-e59d48e13378');\n",
       "          const dataTable =\n",
       "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                     [key], {});\n",
       "          if (!dataTable) return;\n",
       "\n",
       "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "            + ' to learn more about interactive tables.';\n",
       "          element.innerHTML = '';\n",
       "          dataTable['output_type'] = 'display_data';\n",
       "          await google.colab.output.renderOutput(dataTable, element);\n",
       "          const docLink = document.createElement('div');\n",
       "          docLink.innerHTML = docLinkHtml;\n",
       "          element.appendChild(docLink);\n",
       "        }\n",
       "      </script>\n",
       "    </div>\n",
       "  </div>\n",
       "  "
      ],
      "text/plain": [
       "          id  clip_count         filename\n",
       "0      30001          11  clips-30001.jpg\n",
       "1      30002           2  clips-30002.jpg\n",
       "2      30003          26  clips-30003.jpg\n",
       "3      30004          41  clips-30004.jpg\n",
       "4      30005          49  clips-30005.jpg\n",
       "...      ...         ...              ...\n",
       "19995  49996          35  clips-49996.jpg\n",
       "19996  49997          54  clips-49997.jpg\n",
       "19997  49998          72  clips-49998.jpg\n",
       "19998  49999          24  clips-49999.jpg\n",
       "19999  50000          35  clips-50000.jpg\n",
       "\n",
       "[20000 rows x 3 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nN1AsQeysmGd"
   },
   "source": [
    "Separate into a training and validation (for early stopping)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "CI4_qNaSqp31",
    "outputId": "abe450db-ce00-44ac-cbbd-166912687df4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training size: 18000\n",
      "Validate size: 2000\n"
     ]
    }
   ],
   "source": [
    "TRAIN_PCT = 0.9\n",
    "TRAIN_CUT = int(len(df) * TRAIN_PCT)\n",
    "\n",
    "df_train = df[0:TRAIN_CUT]\n",
    "df_validate = df[TRAIN_CUT:]\n",
    "\n",
    "print(f\"Training size: {len(df_train)}\")\n",
    "print(f\"Validate size: {len(df_validate)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KNoqk5uIz6FW"
   },
   "source": [
    "We are now ready to create two ImageDataGenerator objects. We currently use a generator, which creates additional training data by manipulating the source material. This technique can produce considerably stronger neural networks. The generator below flips the images both vertically and horizontally. Keras will train the neuron network both on the original images and the flipped images. This augmentation increases the size of the training data considerably. Module 6.4 goes deeper into the transformations you can perform. You can also specify a target size to resize the images automatically.\n",
    "\n",
    "The function **flow_from_dataframe** loads the labels from a Pandas dataframe connected to our **train.csv** file. When we demonstrate classification, we will use the **flow_from_directory**; which loads the labels from the directory structure rather than a CSV."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "YZzeAHdfsy0O",
    "outputId": "c6741a3f-bf89-4e9e-a730-6559134fcc98"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found 18000 validated image filenames.\n",
      "Found 2000 validated image filenames.\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "import keras_preprocessing\n",
    "from keras_preprocessing import image\n",
    "from keras_preprocessing.image import ImageDataGenerator\n",
    "\n",
    "training_datagen = ImageDataGenerator(\n",
    "  rescale = 1./255,\n",
    "  horizontal_flip=True,\n",
    "  vertical_flip=True,\n",
    "  fill_mode='nearest')\n",
    "\n",
    "train_generator = training_datagen.flow_from_dataframe(\n",
    "        dataframe=df_train,\n",
    "        directory=SOURCE,\n",
    "        x_col=\"filename\",\n",
    "        y_col=\"clip_count\",\n",
    "        target_size=(256, 256),\n",
    "        batch_size=32,\n",
    "        class_mode='other')\n",
    "\n",
    "validation_datagen = ImageDataGenerator(rescale = 1./255)\n",
    "\n",
    "val_generator = validation_datagen.flow_from_dataframe(\n",
    "        dataframe=df_validate,\n",
    "        directory=SOURCE,\n",
    "        x_col=\"filename\",\n",
    "        y_col=\"clip_count\",\n",
    "        target_size=(256, 256),\n",
    "        class_mode='other')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_GVeYo6p2sdG"
   },
   "source": [
    "We can now train the neural network. The code to build and train the neural network is not that different than in the previous modules. We will use the Keras Sequential class to provide layers to the neural network. We now have several new layer types that we did not previously see.\n",
    "\n",
    "* **Conv2D** - The convolution layers.\n",
    "* **MaxPooling2D** - The max-pooling layers.\n",
    "* **Flatten** - Flatten the 2D (and higher) tensors to allow a Dense layer to process.\n",
    "* **Dense** - Dense layers, the same as demonstrated previously. Dense layers often form the final output layers of the neural network.\n",
    "\n",
    "The training code is very similar to previously. This code is for regression, so a final linear activation is used, along with mean_squared_error for the loss function. The generator provides both the *x* and *y* matrixes we previously supplied."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "DBHlqCIgtWSq",
    "outputId": "89e48c26-3776-41fa-bff0-bb2b6a39d5d8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential\"\n",
      "_________________________________________________________________\n",
      " Layer (type)                Output Shape              Param #   \n",
      "=================================================================\n",
      " conv2d (Conv2D)             (None, 254, 254, 64)      1792      \n",
      "                                                                 \n",
      " max_pooling2d (MaxPooling2D  (None, 127, 127, 64)     0         \n",
      " )                                                               \n",
      "                                                                 \n",
      " conv2d_1 (Conv2D)           (None, 125, 125, 64)      36928     \n",
      "                                                                 \n",
      " max_pooling2d_1 (MaxPooling  (None, 62, 62, 64)       0         \n",
      " 2D)                                                             \n",
      "                                                                 \n",
      " flatten (Flatten)           (None, 246016)            0         \n",
      "                                                                 \n",
      " dense (Dense)               (None, 512)               125960704 \n",
      "                                                                 \n",
      " dense_1 (Dense)             (None, 1)                 513       \n",
      "                                                                 \n",
      "=================================================================\n",
      "Total params: 125,999,937\n",
      "Trainable params: 125,999,937\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "Epoch 1/25\n",
      "563/563 [==============================] - 64s 96ms/step - loss: 193.3214 - val_loss: 25.4486\n",
      "Epoch 2/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 25.5836 - val_loss: 13.8235\n",
      "Epoch 3/25\n",
      "563/563 [==============================] - 53s 93ms/step - loss: 18.8956 - val_loss: 12.4469\n",
      "Epoch 4/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 18.8489 - val_loss: 20.0634\n",
      "Epoch 5/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 16.5164 - val_loss: 17.8989\n",
      "Epoch 6/25\n",
      "563/563 [==============================] - 52s 91ms/step - loss: 15.5483 - val_loss: 16.3132\n",
      "Epoch 7/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 15.6795 - val_loss: 16.9717\n",
      "Epoch 8/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 11.5606 - val_loss: 12.1518\n",
      "Epoch 9/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 26.0139 - val_loss: 34.7480\n",
      "Epoch 10/25\n",
      "563/563 [==============================] - 52s 93ms/step - loss: 13.2884 - val_loss: 12.1712\n",
      "Epoch 11/25\n",
      "563/563 [==============================] - 52s 93ms/step - loss: 10.7682 - val_loss: 12.7467\n",
      "Epoch 12/25\n",
      "563/563 [==============================] - 53s 94ms/step - loss: 9.8051 - val_loss: 9.8815\n",
      "Epoch 13/25\n",
      "563/563 [==============================] - 62s 109ms/step - loss: 8.2021 - val_loss: 8.1277\n",
      "Epoch 14/25\n",
      "563/563 [==============================] - 52s 93ms/step - loss: 7.0807 - val_loss: 6.8239\n",
      "Epoch 15/25\n",
      "563/563 [==============================] - 52s 93ms/step - loss: 6.6521 - val_loss: 7.0292\n",
      "Epoch 16/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 5.2696 - val_loss: 6.5994\n",
      "Epoch 17/25\n",
      "563/563 [==============================] - 52s 93ms/step - loss: 5.1749 - val_loss: 12.0623\n",
      "Epoch 18/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 27.0990 - val_loss: 15.1000\n",
      "Epoch 19/25\n",
      "563/563 [==============================] - 52s 92ms/step - loss: 10.3702 - val_loss: 6.4094\n",
      "Epoch 20/25\n",
      "563/563 [==============================] - 54s 96ms/step - loss: 5.1338 - val_loss: 5.1066\n",
      "Epoch 21/25\n",
      "563/563 [==============================] - 60s 107ms/step - loss: 4.1413 - val_loss: 6.3927\n",
      "Epoch 22/25\n",
      "563/563 [==============================] - 55s 97ms/step - loss: 3.8659 - val_loss: 4.6390\n",
      "Epoch 23/25\n",
      "563/563 [==============================] - 55s 98ms/step - loss: 3.4385 - val_loss: 4.1845\n",
      "Epoch 24/25\n",
      "563/563 [==============================] - 54s 95ms/step - loss: 3.2399 - val_loss: 4.0449\n",
      "Epoch 25/25\n",
      "563/563 [==============================] - 53s 94ms/step - loss: 3.2823 - val_loss: 4.4899\n",
      "Elapsed time: 0:22:22.78\n"
     ]
    }
   ],
   "source": [
    "from tensorflow.keras.callbacks import EarlyStopping\n",
    "import time\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    # Note the input shape is the desired size of the image 150x150 \n",
    "    # with 3 bytes color.\n",
    "    # This is the first convolution\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu', \n",
    "        input_shape=(256, 256, 3)),\n",
    "    tf.keras.layers.MaxPooling2D(2, 2),\n",
    "    # The second convolution\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    # 512 neuron hidden layer\n",
    "    tf.keras.layers.Dense(512, activation='relu'),\n",
    "    tf.keras.layers.Dense(1, activation='linear')\n",
    "])\n",
    "\n",
    "\n",
    "model.summary()\n",
    "epoch_steps = 250 # needed for 2.2\n",
    "validation_steps = len(df_validate)\n",
    "model.compile(loss = 'mean_squared_error', optimizer='adam')\n",
    "monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, \n",
    "        patience=5, verbose=1, mode='auto',\n",
    "        restore_best_weights=True)\n",
    "\n",
    "start_time = time.time()\n",
    "history = model.fit(train_generator,  \n",
    "  verbose = 1, \n",
    "  validation_data=val_generator, callbacks=[monitor], epochs=25)\n",
    "\n",
    "elapsed_time = time.time() - start_time\n",
    "print(\"Elapsed time: {}\".format(hms_string(elapsed_time)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gNiELOX53PtU"
   },
   "source": [
    "This code will run very slowly if you do not use a GPU. The above code takes approximately 13 minutes with a GPU.\n",
    "\n",
    "## Score Regression Image Data\n",
    "\n",
    "Scoring/predicting from a generator is a bit different than training. We do not want augmented images, and we do not wish to have the dataset shuffled. For scoring, we want a prediction for each input. We construct the generator as follows:\n",
    "\n",
    "* shuffle=False\n",
    "* batch_size=1\n",
    "* class_mode=None\n",
    "\n",
    "We use a **batch_size** of 1 to guarantee that we do not run out of GPU memory if our prediction set is large. You can increase this value for better performance. The **class_mode** is None because there is no *y*, or label. After all, we are predicting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "QFXvtr-NtlQt",
    "outputId": "01c75325-c863-4fdd-b6a2-22f9b397ec69"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found 5000 validated image filenames.\n"
     ]
    }
   ],
   "source": [
    "df_test = pd.read_csv(\n",
    "    os.path.join(SOURCE,\"test.csv\"), \n",
    "    na_values=['NA', '?'])\n",
    "\n",
    "df_test['filename']=\"clips-\"+df_test[\"id\"].astype(str)+\".jpg\"\n",
    "\n",
    "test_datagen = ImageDataGenerator(rescale = 1./255)\n",
    "\n",
    "test_generator = validation_datagen.flow_from_dataframe(\n",
    "        dataframe=df_test,\n",
    "        directory=SOURCE,\n",
    "        x_col=\"filename\",\n",
    "        batch_size=1,\n",
    "        shuffle=False,\n",
    "        target_size=(256, 256),\n",
    "        class_mode=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TOuNIlpLAF5A"
   },
   "source": [
    "We need to reset the generator to ensure we are always at the beginning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "gS0CjJ4bt8jQ"
   },
   "outputs": [],
   "source": [
    "test_generator.reset()\n",
    "pred = model.predict(test_generator,steps=len(df_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now generate a CSV file to hold the predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "4n-t8k5bt_nG"
   },
   "outputs": [],
   "source": [
    "df_submit = pd.DataFrame({'id':df_test['id'],'clip_count':pred.flatten()})\n",
    "df_submit.to_csv(os.path.join(PATH,\"submit.csv\"),index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sicFJd8u5c3v"
   },
   "source": [
    "## Classification Neural Networks\n",
    "\n",
    "Just like earlier in this module, we will load data. However, this time we will use a dataset of images of three different types of the iris flower. This zip file contains three different directories that specify each image's label. The directories are named the same as the labels:\n",
    "\n",
    "* iris-setosa\n",
    "* iris-versicolour\n",
    "* iris-virginica\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "zxeLaa1c5gGA"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "URL = \"https://github.com/jeffheaton/data-mirror/releases\"\n",
    "DOWNLOAD_SOURCE = URL+\"/download/v1/iris-image.zip\"\n",
    "DOWNLOAD_NAME = DOWNLOAD_SOURCE[DOWNLOAD_SOURCE.rfind('/')+1:]\n",
    "\n",
    "if COLAB:\n",
    "  PATH = \"/content\"\n",
    "  EXTRACT_TARGET = os.path.join(PATH,\"iris\")\n",
    "  SOURCE = EXTRACT_TARGET # In this case its the same, no subfolder\n",
    "else:\n",
    "  # I used this locally on my machine, you may need different\n",
    "  PATH = \"/Users/jeff/temp\"\n",
    "  EXTRACT_TARGET = os.path.join(PATH,\"iris\")\n",
    "  SOURCE = EXTRACT_TARGET # In this case its the same, no subfolder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hset2s3P9MFV"
   },
   "source": [
    "Just as before, we unzip the images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "9sQdvl9F6Xru",
    "outputId": "51172c54-1fd8-4f85-a71e-fd899f7d7aaa"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2022-03-01 23:08:29--  https://github.com/jeffheaton/data-mirror/releases/download/v1/iris-image.zip\n",
      "Resolving github.com (github.com)... 140.82.121.4\n",
      "Connecting to github.com (github.com)|140.82.121.4|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/408419764/d548babd-36c3-414e-add2-a5d9ab941e6e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220301%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220301T230829Z&X-Amz-Expires=300&X-Amz-Signature=f9315f39373d1b8e6ae60a618db5da58714c8bab017e0f55b38c7c0a55d2cb20&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=408419764&response-content-disposition=attachment%3B%20filename%3Diris-image.zip&response-content-type=application%2Foctet-stream [following]\n",
      "--2022-03-01 23:08:30--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/408419764/d548babd-36c3-414e-add2-a5d9ab941e6e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220301%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220301T230829Z&X-Amz-Expires=300&X-Amz-Signature=f9315f39373d1b8e6ae60a618db5da58714c8bab017e0f55b38c7c0a55d2cb20&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=408419764&response-content-disposition=attachment%3B%20filename%3Diris-image.zip&response-content-type=application%2Foctet-stream\n",
      "Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...\n",
      "Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 5587253 (5.3M) [application/octet-stream]\n",
      "Saving to: ‘/content/iris-image.zip’\n",
      "\n",
      "/content/iris-image 100%[===================>]   5.33M  6.65MB/s    in 0.8s    \n",
      "\n",
      "2022-03-01 23:08:31 (6.65 MB/s) - ‘/content/iris-image.zip’ saved [5587253/5587253]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# HIDE OUTPUT\n",
    "!wget -O {os.path.join(PATH,DOWNLOAD_NAME)} {DOWNLOAD_SOURCE}\n",
    "!mkdir -p {SOURCE}\n",
    "!mkdir -p {TARGET}\n",
    "!mkdir -p {EXTRACT_TARGET}\n",
    "!unzip -o -d {EXTRACT_TARGET} {os.path.join(PATH, DOWNLOAD_NAME)} >/dev/null"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bS1p_9Mn9EPz"
   },
   "source": [
    "You can see these folders with the following command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "UrtRO9O-SBQc",
    "outputId": "15f20490-a321-4348-844a-fcc24a0c7cf7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "iris-setosa  iris-versicolour  iris-virginica\n"
     ]
    }
   ],
   "source": [
    "!ls /content/iris"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cYDFKA8i9KMP"
   },
   "source": [
    "We set up the generator, similar to before.  This time we use flow_from_directory to get the labels from the directory structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "u7EgpVqAdGvI",
    "outputId": "7c9cc17b-5425-4378-e0ca-561a68fbb9ce"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found 421 images belonging to 3 classes.\n",
      "Found 421 images belonging to 3 classes.\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "import keras_preprocessing\n",
    "from keras_preprocessing import image\n",
    "from keras_preprocessing.image import ImageDataGenerator\n",
    "\n",
    "training_datagen = ImageDataGenerator(\n",
    "  rescale = 1./255,\n",
    "  horizontal_flip=True,\n",
    "  vertical_flip=True,\n",
    "  width_shift_range=[-200,200],\n",
    "  rotation_range=360,\n",
    "\n",
    "  fill_mode='nearest')\n",
    "\n",
    "train_generator = training_datagen.flow_from_directory(\n",
    "    directory=SOURCE, target_size=(256, 256), \n",
    "    class_mode='categorical', batch_size=32, shuffle=True)\n",
    "\n",
    "validation_datagen = ImageDataGenerator(rescale = 1./255)\n",
    "\n",
    "validation_generator = validation_datagen.flow_from_directory(\n",
    "    directory=SOURCE, target_size=(256, 256), \n",
    "    class_mode='categorical', batch_size=32, shuffle=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AFesQtNZBZP0"
   },
   "source": [
    "Training the neural network with classification is similar to regression. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "MBnwiM-XflQc",
    "outputId": "3279a8b4-3828-4ad3-8c03-6361cb10a975"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_1\"\n",
      "_________________________________________________________________\n",
      " Layer (type)                Output Shape              Param #   \n",
      "=================================================================\n",
      " conv2d_2 (Conv2D)           (None, 254, 254, 16)      448       \n",
      "                                                                 \n",
      " max_pooling2d_2 (MaxPooling  (None, 127, 127, 16)     0         \n",
      " 2D)                                                             \n",
      "                                                                 \n",
      " conv2d_3 (Conv2D)           (None, 125, 125, 32)      4640      \n",
      "                                                                 \n",
      " dropout (Dropout)           (None, 125, 125, 32)      0         \n",
      "                                                                 \n",
      " max_pooling2d_3 (MaxPooling  (None, 62, 62, 32)       0         \n",
      " 2D)                                                             \n",
      "                                                                 \n",
      " conv2d_4 (Conv2D)           (None, 60, 60, 64)        18496     \n",
      "                                                                 \n",
      " dropout_1 (Dropout)         (None, 60, 60, 64)        0         \n",
      "                                                                 \n",
      " max_pooling2d_4 (MaxPooling  (None, 30, 30, 64)       0         \n",
      " 2D)                                                             \n",
      "                                                                 \n",
      " conv2d_5 (Conv2D)           (None, 28, 28, 64)        36928     \n",
      "                                                                 \n",
      " max_pooling2d_5 (MaxPooling  (None, 14, 14, 64)       0         \n",
      " 2D)                                                             \n",
      "                                                                 \n",
      " conv2d_6 (Conv2D)           (None, 12, 12, 64)        36928     \n",
      "                                                                 \n",
      " max_pooling2d_6 (MaxPooling  (None, 6, 6, 64)         0         \n",
      " 2D)                                                             \n",
      "                                                                 \n",
      " flatten_1 (Flatten)         (None, 2304)              0         \n",
      "                                                                 \n",
      " dropout_2 (Dropout)         (None, 2304)              0         \n",
      "                                                                 \n",
      " dense_2 (Dense)             (None, 512)               1180160   \n",
      "                                                                 \n",
      " dense_3 (Dense)             (None, 3)                 1539      \n",
      "                                                                 \n",
      "=================================================================\n",
      "Total params: 1,279,139\n",
      "Trainable params: 1,279,139\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "Epoch 1/50\n",
      "10/10 [==============================] - 6s 486ms/step - loss: 1.0254\n",
      "Epoch 2/50\n",
      "10/10 [==============================] - 5s 472ms/step - loss: 0.9060\n",
      "Epoch 3/50\n",
      "10/10 [==============================] - 5s 474ms/step - loss: 0.9712\n",
      "Epoch 4/50\n",
      "10/10 [==============================] - 5s 520ms/step - loss: 0.9099\n",
      "Epoch 5/50\n",
      "10/10 [==============================] - 5s 517ms/step - loss: 0.9061\n",
      "Epoch 6/50\n",
      "10/10 [==============================] - 5s 510ms/step - loss: 0.8965\n",
      "Epoch 7/50\n",
      "10/10 [==============================] - 5s 458ms/step - loss: 0.8909\n",
      "Epoch 8/50\n",
      "10/10 [==============================] - 5s 514ms/step - loss: 0.8941\n",
      "Epoch 9/50\n",
      "10/10 [==============================] - 5s 493ms/step - loss: 0.9248\n",
      "Epoch 10/50\n",
      "10/10 [==============================] - 5s 500ms/step - loss: 0.8780\n",
      "Epoch 11/50\n",
      "10/10 [==============================] - 5s 453ms/step - loss: 0.8724\n",
      "Epoch 12/50\n",
      "10/10 [==============================] - 5s 448ms/step - loss: 0.8901\n",
      "Epoch 13/50\n",
      "10/10 [==============================] - 5s 456ms/step - loss: 0.8817\n",
      "Epoch 14/50\n",
      "10/10 [==============================] - 5s 465ms/step - loss: 0.9040\n",
      "Epoch 15/50\n",
      "10/10 [==============================] - 5s 449ms/step - loss: 0.8779\n",
      "Epoch 16/50\n",
      "10/10 [==============================] - 4s 441ms/step - loss: 0.8479\n",
      "Epoch 17/50\n",
      "10/10 [==============================] - 5s 499ms/step - loss: 0.8713\n",
      "Epoch 18/50\n",
      "10/10 [==============================] - 5s 456ms/step - loss: 0.8432\n",
      "Epoch 19/50\n",
      "10/10 [==============================] - 4s 444ms/step - loss: 0.8816\n",
      "Epoch 20/50\n",
      "10/10 [==============================] - 5s 508ms/step - loss: 0.8791\n",
      "Epoch 21/50\n",
      "10/10 [==============================] - 5s 497ms/step - loss: 0.8553\n",
      "Epoch 22/50\n",
      "10/10 [==============================] - 5s 448ms/step - loss: 0.8275\n",
      "Epoch 23/50\n",
      "10/10 [==============================] - 5s 502ms/step - loss: 0.8216\n",
      "Epoch 24/50\n",
      "10/10 [==============================] - 5s 456ms/step - loss: 0.8739\n",
      "Epoch 25/50\n",
      "10/10 [==============================] - 5s 510ms/step - loss: 0.8650\n",
      "Epoch 26/50\n",
      "10/10 [==============================] - 5s 456ms/step - loss: 0.8405\n",
      "Epoch 27/50\n",
      "10/10 [==============================] - 5s 456ms/step - loss: 0.8729\n",
      "Epoch 28/50\n",
      "10/10 [==============================] - 5s 499ms/step - loss: 0.8618\n",
      "Epoch 29/50\n",
      "10/10 [==============================] - 5s 500ms/step - loss: 0.8125\n",
      "Epoch 30/50\n",
      "10/10 [==============================] - 5s 504ms/step - loss: 0.8813\n",
      "Epoch 31/50\n",
      "10/10 [==============================] - 5s 508ms/step - loss: 0.8392\n",
      "Epoch 32/50\n",
      "10/10 [==============================] - 5s 449ms/step - loss: 0.8377\n",
      "Epoch 33/50\n",
      "10/10 [==============================] - 5s 499ms/step - loss: 0.8509\n",
      "Epoch 34/50\n",
      "10/10 [==============================] - 5s 454ms/step - loss: 0.8647\n",
      "Epoch 35/50\n",
      "10/10 [==============================] - 5s 466ms/step - loss: 0.8874\n",
      "Epoch 36/50\n",
      "10/10 [==============================] - 5s 502ms/step - loss: 0.9221\n",
      "Epoch 37/50\n",
      "10/10 [==============================] - 5s 511ms/step - loss: 0.9186\n",
      "Epoch 38/50\n",
      "10/10 [==============================] - 5s 496ms/step - loss: 0.8549\n",
      "Epoch 39/50\n",
      "10/10 [==============================] - 5s 493ms/step - loss: 0.9194\n",
      "Epoch 40/50\n",
      "10/10 [==============================] - 5s 496ms/step - loss: 0.8528\n",
      "Epoch 41/50\n",
      "10/10 [==============================] - 5s 453ms/step - loss: 0.9105\n",
      "Epoch 42/50\n",
      "10/10 [==============================] - 5s 454ms/step - loss: 0.8462\n",
      "Epoch 43/50\n",
      "10/10 [==============================] - 5s 459ms/step - loss: 0.8858\n",
      "Epoch 44/50\n",
      "10/10 [==============================] - 5s 497ms/step - loss: 0.9119\n",
      "Epoch 45/50\n",
      "10/10 [==============================] - 5s 458ms/step - loss: 0.8799\n",
      "Epoch 46/50\n",
      "10/10 [==============================] - 5s 499ms/step - loss: 0.8582\n",
      "Epoch 47/50\n",
      "10/10 [==============================] - 5s 493ms/step - loss: 0.8536\n",
      "Epoch 48/50\n",
      "10/10 [==============================] - 5s 490ms/step - loss: 0.8669\n",
      "Epoch 49/50\n",
      "10/10 [==============================] - 5s 458ms/step - loss: 0.7957\n",
      "Epoch 50/50\n",
      "10/10 [==============================] - 5s 501ms/step - loss: 0.8670\n"
     ]
    }
   ],
   "source": [
    "from tensorflow.keras.callbacks import EarlyStopping\n",
    "\n",
    "class_count = len(train_generator.class_indices)\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    # Note the input shape is the desired size of the image \n",
    "    # 300x300 with 3 bytes color\n",
    "    # This is the first convolution\n",
    "    tf.keras.layers.Conv2D(16, (3,3), activation='relu', \n",
    "        input_shape=(256, 256, 3)),\n",
    "    tf.keras.layers.MaxPooling2D(2, 2),\n",
    "    # The second convolution\n",
    "    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),\n",
    "    tf.keras.layers.Dropout(0.5),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    # The third convolution\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.Dropout(0.5),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    # The fourth convolution\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    # The fifth convolution\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    # Flatten the results to feed into a DNN\n",
    "    \n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dropout(0.5),\n",
    "    # 512 neuron hidden layer\n",
    "    tf.keras.layers.Dense(512, activation='relu'),\n",
    "    # Only 1 output neuron. It will contain a value from 0-1 \n",
    "    tf.keras.layers.Dense(class_count, activation='softmax')\n",
    "])\n",
    "\n",
    "model.summary()\n",
    "\n",
    "model.compile(loss = 'categorical_crossentropy', optimizer='adam')\n",
    "\n",
    "model.fit(train_generator, epochs=50, steps_per_epoch=10, \n",
    "                    verbose = 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fmjjtfuv4R9-"
   },
   "source": [
    "The iris image dataset is not easy to predict; it turns out that a tabular dataset of measurements is more manageable.  However, we can achieve a 63%. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "BUvoMBK5uYKs",
    "outputId": "f8091a91-f841-45a9-af9a-b531b0741899"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.6389548693586699\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import accuracy_score\n",
    "import numpy as np\n",
    "\n",
    "validation_generator.reset()\n",
    "pred = model.predict(validation_generator)\n",
    "\n",
    "predict_classes = np.argmax(pred,axis=1)\n",
    "expected_classes = validation_generator.classes\n",
    "\n",
    "correct = accuracy_score(expected_classes,predict_classes)\n",
    "print(f\"Accuracy: {correct}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QFC6ukZ9cTbp"
   },
   "source": [
    "\n",
    "# Other Resources\n",
    "\n",
    "* [Imagenet:Large Scale Visual Recognition Challenge 2014](http://image-net.org/challenges/LSVRC/2014/index)\n",
    "* [Andrej Karpathy](http://cs.stanford.edu/people/karpathy/) - PhD student/instructor at Stanford.\n",
    "* [CS231n Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/) - Stanford course on computer vision/CNN's.\n",
    "* [CS231n - GitHub](http://cs231n.github.io/)\n",
    "* [ConvNetJS](http://cs.stanford.edu/people/karpathy/convnetjs/) - JavaScript library for deep learning."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "anaconda-cloud": {},
  "colab": {
   "collapsed_sections": [],
   "name": "Copy of Copy of t81_558_class_06_2_cnn.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
